LA Crime

Los Angeles, the second-largest city in the United States, is known for Hollywood and beautiful weather. Crimes might not be the first thing that come to mind when you think of the city, but ultimately LA is no stranger to crime. I’m putting in a trigger warning (this can have distressing material) because data is no different than any other content collected or curated by humans - it isn’t always easy to digest. Trigger warning: Crime, Violence (Physical and Sexual)

I hope this project can shed light on how it is possible to get results quite quickly without pointing and clicking in Microsoft Excel. Please check the RMD file for code

Loading a file

#Name things well - don't use file names of datasets already in a preloaded package (i.e. iris).
LAcrimesdata <- read_csv("~/Documents/LACrime/Crime_Data_From_2010_to_Present.csv")
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   `Crime Code` = col_integer(),
##   `Victim Age` = col_integer(),
##   `Premise Code` = col_integer(),
##   `Weapon Used Code` = col_integer(),
##   `Crime Code 1` = col_integer(),
##   `Crime Code 2` = col_integer()
## )
## See spec(...) for full column specifications.

Variables in the dataset

##  [1] "DR Number"              "Date Reported"         
##  [3] "Date Occurred"          "Time Occurred"         
##  [5] "Area ID"                "Area Name"             
##  [7] "Reporting District"     "Crime Code"            
##  [9] "Crime Code Description" "MO Codes"              
## [11] "Victim Age"             "Victim Sex"            
## [13] "Victim Descent"         "Premise Code"          
## [15] "Premise Description"    "Weapon Used Code"      
## [17] "Weapon Description"     "Status Code"           
## [19] "Status Description"     "Crime Code 1"          
## [21] "Crime Code 2"           "Crime Code 3"          
## [23] "Crime Code 4"           "Address"               
## [25] "Cross Street"           "Location"

Above are the 26 variables available in the LA crimes data from https://data.lacity.org/A-Safe-City/Crime-Data-From-2010-to-Present/y8tr-7khq

Number of crimes in this dataset

## [1] 1570615

There are 1570615 crimes in this dataset.

Location being split up into Longitude and Latitude

## [1] "(33.9829, -118.3338)" "(34.0454, -118.3157)" "(33.942, -118.2717)" 
## [4] "(33.9572, -118.2717)" "(34.2009, -118.6369)" "(34.0591, -118.2412)"
## [1] "33.9829" "34.0454" "33.942"  "33.9572" "34.2009" "34.0591"
## [1] " -118.3338" " -118.3157" " -118.2717" " -118.2717" " -118.6369"
## [6] " -118.2412"

I separate the Location variable into Latitude and Longitude to make it easier to visualize later on through a map!

##                  term V1
## 1           DR Number  0
## 2       Date Reported  0
## 3       Date Occurred  0
## 4       Time Occurred  0
## 5             Area ID  0
## 6           Area Name  0
## 7  Reporting District  0
## 8          Crime Code  0
## 9  Status Description  0
## 10            Address  0

Here, I can see which variables have no missing values.

Time of Crime

I wanted to test out whether crimes truly happen more at night. Round 1 of graphs show a lot of peaks, but ultimately this data might need a bird’s eye perspective first to get an overall picture.

This shows that the largest number of crimes is around 12 o’clock. Interesting, considering that people are told to avoid going out in the dark/at night? I decide to divide up day and night to see if there’s anything going on when I zoom out even further. Let’s mark day as between 0600 (inclusive) and 1800 (exclusive). Let’s mark night as between 1800 (inclusive) and 0600 (exclusive)

There are more crimes during the day than night! This disproves my initial thoughts (and conventional wisdom).

Mapping Crime

## Warning in validateCoords(lng, lat, funcName): Data contains 9 rows with
## either missing or invalid lat/lon values and will be ignored

I’ve been interested in learning how to visualize data in a geographic manner - demarcations on maps reflect a lot of policymaking and history. There are however some missing longitudes and latitudes in this dataset. Therefore, while using the leaflet package (which takes the world’s map), you can see quite a lot of crimes near Africa (at 0,0) - which obviously doesn’t make sense, since Los Angeles is nowhere near Africa. Always clean your data!

If you zoom in further, you can see where the crimes are clustered over different years.

Check out here https://lh3.google.com/u/0/d/0B7Wcp9505kpTZzcwb2JtaU4wZnM=w2432-h1296-iv1 a nice set of concentric data points I got out of the map when I zoomed into a neighborhood near UCLA.

Let’s see which areas have the most common, besides (0,0). From earlier on, we saw that Area Names had no missing values, unlike their counterpart Longitudes/Latitudes.